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Abstract — The box-constrained integer least squares problem 
(BILS) arises in MIMO wireless communications applications. 
Typically a sphere decoding algorithm (a tree search algorithm) is 
used to solve the problem. In order to make the search algorithm 
more efficient, the columns of the channel matrix in the BILS 
problem have to be reordered. To our knowledge, there are 
currently two algorithms for column reordering that provide 
the best known results. Both use all available information, but 
they were derived respectively from geometric and algebraic 
points of view and look different. In this paper we modify 
one to make it more computationally efficient and easier to 
comprehend. Then we prove the modified one and the other 
actually give the same column reordering in theory. Finally we 
propose a new mathematically equivalent algorithm, which is 
more computationally efficient and is still easy to understand. 

I. Introduction 

Given a real vector y G M™ and a real matrix H G M™^", 
integer vectors l,u G with I < u, the box-constrained 
integer least squares (BILS) problem is defined as: 



min||y--ffx||2, 



(1) 



where B = Bi x ■ ■ ■ x Bn with Bi = {xi G Z : k < Xi < Ui}. 
This problem arises in wireless communications applications 
such as MIMO signal decoding. In this paper, we assume that 
H has full column rank. The set {w — Hx : x G Z"} is 
referred to as the lattice generated by H. 
Let H have the QR factorization 
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where [Qi, Q2] G M™x" is orthogonal and R eW 

n m — n 

upper triangular. Then, with y = Qjy the BILS problem ([TJ 
is reduced to 

min ||y - i?a;||2. (2) 

x£B 

To solve this reduced problem sphere decoding search algo- 
rithms (see, e.g., fl], ^ and |[3|) enumerate the elements in 
B in some order to find the optimal solution. 

If we reorder the columns of H, i.e., we apply a permutation 
matrix P to H from the right, then we will obtain a different 
R-factor, resulting in different search speed. A few algorithms 
have been proposed to find P to minimize the complexity of 



the search algorithms. In O], the well-known V-BLAST col- 
umn reordering strategy originally given in (0] was proposed 
for this purpose. In [3 1, the SQRD column reordering strategy 
originally presented in [51 for the same purpose as V-BLAST, 
was proposed for this purpose. Both strategies use only the 
information of the matrix H. 

In |f6l, Su and Wassell considered the geometry of the BILS 
problem for the case that H is nonsingular and proposed a new 
column reordering algorithm (to be called the SW algorithm 
from here on for convenience) which uses all information 
of the BILS problem ([T]i. Unfortunately, in our point of 
view, the geometric interpretation of this algorithm is hard 
to understand. Probably due to page limit, the description of 
the algorithm is very concise, making efficient implementation 
difficult for ordinary users. 

In this paper we will give some new insight of the SW 
algorithm from an algebraic point of view. We will make some 
modifications so that the algorithm becomes more efficient and 
easier to understand and furthermore it can handle a general 
full column rank H. 

Independently Chang and Han in f3\ proposed another 
column reordering algorithm (which will be referred to as 
CH). Their algorithm also uses all information of dl} and the 
derivation is based on an algebraic point of view. It is easy to 
see from the equations in the search process exactly what the 
CH column reordering is doing and why we should expect 
a reduced complexity in the search process. The detailed 
description of the CH column reordering is given in [3 1 and 
it is easy for others to implement the algorithm. But our 
numerical tests indicated CH has a higher complexity than 
SW, when SW is implemented efficiently. Our numerical tests 
also showed that CH and SW almost always produced the 
same permutation matrix P. 

In this paper, we will show that the CH algorithm and the 
(modified) SW algorithm give the same column reordering 
in theory. This is interesting because both algorithms were 
derived through different motivations and we now have both 
a geometric justification and an algebraic justification for why 
the column reordering strategy should reduce the complexity 
of the search. Furthermore, using the knowledge that certain 
steps in each algorithm are equivalent, we can combine the 
best parts from each into a new algorithm. The new algorithm 



has a lower flop count than either of the originals. This is 
important to the successive interference cancellation decoder, 
which computes a suboptimal solution to ([T]i. The new algo- 
rithm can be interpreted in the same way as CH, so it is easy 
to understand. 

In this paper, denotes the i*^ column of the identity matrix 
/. For a set of integer numbers S and real number x, lx~\s 
denotes the nearest integer in 5 to x and if there is a tie it 
denotes the one which has smaller magnitude. For z E S, S\z 
denotes iS after z is removed. We sometimes use MATLAB- 
like notation for matrices and vectors, e.g., Ai-m.i:7i denotes 
the matrix formed by the first m rows and n columns of the 
matrix A and A:,i:„ denote the matrix formed by the first n 
columns of A. The j*^^ column of a matrix A is demoted either 
by flj or A- J. 

II. Search Process 

Both CH and SW column reordering algorithms use ideas 
that arise from the search process. Before the column reorder- 
ings are introduced, it is important to have an understanding 
of the sphere decoding search process. 

Consider the ILS problem We would like to enumerate 
the elements in B in an efficient manner in order to find the 
solution X. One such enumeration strategy is described in [|3]. 
We will now describe it briefly. 

Suppose that the solution satisfies the following bound, 

\\y-Rx\\l<^. (3) 

There are a few ways to choose a valid initial value for /3, see, 
e.g., Ul. The inequality ^ defines an ellipsoid in terms of x 
or a hyper-sphere in terms of the lattice point w ~ Rx with 
radius (3. Define 

n 

Cfc = (i/fc - ^ rkjXj)/rkk, k = n,n-l,...,l, (4) 

j=fc+i 

where when k — n the sum in the right hand side does not 
exist. Then (O can be rewritten as 

n 

J2^kkiXk -Ckf </3, 
k=l 

which implies the following set of inequalities: 

n 

level k : rl^{xk ~ c^f < P - '^^^(^' " ^^)'' 

i=k+l 

for fc = n, n — 1, . . . , 1. 

We begin the search process at level n. Choose a;„ = 
[c„]e„, the nearest integer in Z5„ to c„. If the inequality (|5]l 
with A; = n is not satisfied, it will not be satisfied for any 
integer, this means (3 was chosen to be too small, it must be 
enlarged. With a;„ fixed, we can move to level n~l and choose 
Xn-i = \cn-i\B„-i with c„_i Calculated as in (|4]l. At this 
point it is possible that the inequality Q is no longer satisfied. 
If this is the case, we must move back to level n and choose 
Xn to be the second nearest integer to c„. We will continue 
this procedure until we reach level 1, moving back a level if 
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Fig. 1. An example of the search process with solution x = [—1, 3, 1]"^. 

ever the inequality for the current level is no longer satisfied. 
When we reach level 1, we will have found an integer point x. 
We then update (3 — \\y ~ Rx\\2 and try to find a better integer 
point which satisfies the box-constraint in the new ellipsoid. 
Finally in the search process, when we can no longer find any 
Xn to satisfy ^ with k = n, the search process is complete 
and the last integer point x found is the solution. 

The above search process is actually a depth-first tree search, 
see Fig.[T] where the number in a node denote the step number 
at which the node is encountered. 

III. Column Reordering 

In this section we introduce the two orginal column reorder- 
ing algorithms, CH and SW and explain their motivations. We 
give some new insight on SW and propose a modified version. 
We also give a complexity analysis for both algorithms. 

A. Chang and Han 's Algorithm 

The CH algorithm first computes the QR factorization H, 
then tries to reorder the columns of R. The motivation for this 
algorithm comes from observing equation (|5). If the inequality 
is false we know that the current choice for the value of Xk 
given Xk+i:n are fixed is incorrect and we prune the search 
tree. We would like to choose the column permutations so 
that it is likely that the inequality will be false at higher levels 
in the search tree. The CH column reordering strategy does 
this by trying to maximize the left hand side of Q with large 
values of \rkk\ and minimize the right hand side by making 
l^kki^k — Cfc)| large for values of fc = n, n — 1, . . . , 1. 

Here we describe step 1 of the CH algorithm, which deter- 
mines the last column of the final R (or equivalently the last 
column of the final H). Subsequent steps are the same but are 
appUed to a subproblem that is one dimension smaller In step 
1 , for i = 1 , . . . , n we interchange columns i and n of R (thus 
entries of i and n in x are also swapped), then return R to 
upper-triangular by a series of Givens rotations applied to R 
from the left, which are also applied to y. To avoid confusion, 
we denote the new R by R and the new y by y. We then 
compute Cn = yn/fn,n and 

x1 = arg min \fnn{xi - c„)| = [cn\„, , (6) 

where the superscript c denotes the CH algorithm. Let x^ be 
the second closest integer in Bi to c„, i.e., x1 — LcnlB.\a;c ■ 
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(a) H - original column ordering (b) H - columns swapped 

Fig. 2. Geometry of the search with two different column ordering. 



Define 



dist^ = \fnn{Xi - C„)|, 



(7) 



which represents the partial residual given when Xi is taken 
to be x^. Let j = argmax^dist^. Then column j of the 
original R is chosen to be the n*^ column of the final R. 
With the corresponding updated upper triangular R and y (here 
for convenience we have removed hats), the algorithm then 
updates yi-.n-i again by setting yi-.n-i ■= yi:n-i-ri:n-i,nXj 
where Xj = a;J. Choosing Xj to be x| here is exactly the same 
as what the search process does. We then continue to work on 
the subproblem 
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Rl:n- 



l,l:n-ia; 



(8) 



where x = [xi, . . . ,Xj^i,Xn,Xj+i, . . .Xn-i]'^ satisfies the 
corresponding box constraint. The pseudocode of the CH 
algorithm is given in Algorithm [T] 

To determine the last column, CH finds the permutation to 
maximize |r„„(a;f — c„)|. Using instead of ensures that 
\x^ — Cn\ is never less than 0.5 but also not very large. This 
means that usually if |r„„(a;^ — c„)| is large, |r„„| is large as 
well and the requirement to have large |r„„| is met. Using x^ 
would not be a good choice because — c„| might be very 
small or even 0, then column i would not be chosen to be 
column n even if the corresponding |r„„| is large and on the 
contrary a column with small |r„„| but large — c„| may 
be chosen. 

Now we will consider the complexity of CH. The significant 
cost comes from line|9]in AlgorithmlT] which requires 6{k—i)^ 
flops. If we sum this cost over all loop iterations and add the 
cost of the QR factorization by Householder transformations, 
we get a total complexity of 0.5n'^ + 2ran? flops. 

B. Su and Wassell 's Algorithm 

The motivation for the SW algorithm comes from examining 
the geometry of the search process. 

Fig. 12] shows a 2-D BILS problem; |2a) represents the 
original column ordering and |2lb) is after the columns have 
been swapped. 



H=[R] and 



Algorithm 1 CH Algorithm - Returns p, the column permu- 
tation vector 

p := \ : n 
p' :— 1 : n 

Compute the QR factorization of H: 

compute y := Qjy 
for /j = n to 2 do 
maxDist := —1 
for i = 1 to fc do 

y yv.k 

R '■= Rl:k,l:k 

swap columns i and k of R, return it to upper tri- 
angular with Givens rotations, also apply the Givens 
rotations to y 

[yk/fk,k] 



10: 
11: 
12: 

13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21 
22: 
23: 
24: 
25: 
26: 



disti := \ fk,kXi - yk\ 
if disfi > maxDist then 
maxDist :— disti 

j ■■= i 
R' := R 

y' ■= y 
end if 
end for 

Pk ■■= P'j 

Interchange the intervals Bk and Bj 
Intechange entries k and j in p' 

Rl:k,l:k ■= R' 

yi:k y' - R[..k,k^"j 
end for 

Pi ■= Pi 



In the SW algorithm H = [hi, 
square and non-singular Let 



. , hn] is assumed to be 



For any integer a, 16) defines the affine sets, Fi(a) — 
{w I gf{w — hio) ~ 0}. The lattice points generated 
by H occur at the intersections of these affine sets. Let 
the orthogonal projection of a vector s onto a vector t be 
denoted as projj(s), then the orthogonal projection of some 
vector s onto Fi{a) is W'^]Fi{a){^) = s — proj^. (s — hio). 
Therefore the orthogonal distance between s and Fi{a) is 
dist(s, Fi(a)) — ||s— pi"oj^.(jj)(s)||2- In the points labeled 
proj^^(.]^^(y) and W'^]Fo{-i){y) 12 are called residual 

targets and "represent the components [of y} that remain after 
an orthogonal part has been projected away." 

Note that F2{a) in Fig. |2] is a sublattice of dimension 1. 
Algebraically it is the lattice generated by H with column 2 
removed. It can also be thought of as a subtree of the search 
tree where X2 = a has been fixed. In the first step of the 
search process for a general case, Xn is chosen to be Xn = 
argminQ,ge^ dist(j/, F„(a)); thus F„(x„) is the nearest affine 
set to y. Actually the value of x„ is identical to [c„] given 



in Section [III which will be proved later. Then y is updated as 
y projj^^^j,^ •)(?/) — /i„a;„. If we look at Fig. |2l we see that 
the projection proj^^j^^-, (y) moves y onto i^„(x„), while the 
subtraction of algebraically fixes the value of a;„. This 

is necessary because in subsequent steps we will not consider 
the column /i„. 

We now apply the same process to the new n — 1 
dimensional search space Fn{xn). If at some level i, 
minctgBi dist(y, Fi(Q!)) exceeds the current search radius, we 
must move back to level When the search process reaches 
level 1 and fixes xi, it updates the radius to dist{y, Fi{xi)) 
and moves back up to level 2. 

Note that this search process is mathematically equivalent 
to the one described in section [III the difference is that it 
does projections because the generator matrix is not assumed 
to be upper-triangular Computationally the former is more 
expensive than the latter. 

To see the motivation of the SW algorithm for choosing 
a particular column ordering, consider Fig. |2] Suppose the 
search algorithm has knowledge of the residual for the optimal 
solution (the radius of the circle in the diagram). With the 
column ordering chosen in (a), there are two possible choices 
for X2, leading to the two dashed lines F2{—1) and i^2(l) 
which cross the circle. This means that we will need to find xi 
for both of these choices before we can determine which one 
leads to the optimum solution. In (b), there is only one possible 
choice for xi, leading to the only dashed line 1) which 
crosses the circle, meaning we only need to find X2 to find 
the optimum solution. Since the projection resulting from the 
correct choice of X2 will always be within the sphere, it makes 
sense to choose the ordering which maximizes the distance to 
the second best choice for X2 in hopes that the second nearest 
choice will result in a value for minQge2 dist(y, i^2(a)) out- 
side the sphere and the dimensionality can be reduced by one. 
For more detail on the geometry, see fE]. 

The following will give an overview of the SW algorithm 
as given in ||6l but described in a framework similar to what 
was used to describe CH. In the first step to determine the last 
column, for each i = 1 , . . . , n, we compute 

a;- =arg min dist(y, -F,(a)) =arg min |y^gj-a| = [y'^gt] „. , 

(9) 

where the superscript s stands for the SW algorithm. Let 
xf be the second closest integer in Bi to y'^ gi, i.e., if = 
[v"^ g^] Si\x- ■ argmaxjdist(?/,F,(xf)). Then SW 

chooses column j as the last column of the final reordered H, 
updates y by setting y :— W^]f (x^)iy) ~ ^j^j updates G 
by setting gi := proj^ (-q-j (g^) for all i ^ j. After G and y have 
been updated, the algorithm continues to find column n — 1 
in the same way etc. The pseudo-code of the SW algorithm 
is given in Algorithmic] 

[61 did not say how to implement the algorithm and did 
not give a complexity analysis. The parts of the cost we must 
consider for implementation occur in lines |9] and [T9l Note that 
dist{y,F^{xl)) = llprojg^ (y - /i»if ) II2 and projj,^.(o)(gz) = 
g^ -projg,.gj, where proj^^ = gigj = grgf /M^. A naive 



Algorithm 2 SW Algorithm - Returns p, the column permu- 
tation vector 

1: p :^ 1 : n 

2: p' :-{l,2,...,n} 

3: G ~ H-^ 

4: for /c = n to 2 do 

5: maxDist —1 

6: for i G p' do 

8: xf ■.= [y'^g,]^\^, 

9: dist^=dist(y,F;(xf)) 

10: if distf > maxDist then 

11: maxDist :— distf 

12: j := i 

13: end if 

14: end for 

15: Pk := j 

16: p' := p'\j 

17: 2/ :=projj,^.(^.)(2/) - 

18: for i e p' do 

19: g» :=proj;.^(o)(5*) 

20: end for 

21: end for 

22: pi p' 



implementation would first compute proj^., requiring 71^ flops, 
then compute ||proj^. (y — hixf )\\2 and gi — proj^.g^, each re- 
quiring 2n^ flops. Summing these costs over all loop iterations 
we get a total complexity of 2.5n'* flops. In the next subsection 
we will simplify some steps in Algorithm |2] and show how to 
implement them efficiently. 

C. Algebraic Interpretation and Modifications of SW 

In this section we give new algebraic interpretation of some 
steps in Algorithm 2, simplify some key steps to improve the 
efficiency, and extend the algorithm to handle a more general 
case. All line numbers refer to Algorithm 2. 

First we show how to efficiently compute distf in line |9] 
Observing that gjhi = 1, we have 

distf = \\g,g\{y - h.xDh = \y^ g^ - x||/||g.||2. (10) 

Note that y^ gi and xl have been computed in lines [7] and 
|8] respectively. So the main cost of computing dist^ is the 
cost of computing ||gi|l2, requiring only 2n flops. For k — n 
in Algorithm 2, y'^gi — y^H^^Ci — {H~^y)^ei, i.e., 
y'^gi is the i*'' entry of the real solution for Hx = y. The 
interpretation can be generalized to a general k. 
In line [19] Algorithm 2, 

gT^ = wo]F,{0){9i) 

= {I-Woig^)g^ = 5* -5j(5j5i/ll5jll2)- (11) 

Using the last expression for computation needs only 4?! flops 
(note that \\gj\\2 has been computed before, see (fTOll). We can 
actually show that the above is performing updating of G, the 



Moore-Penrose generalized inverse of H after we remove its 
j*^ column. For proof of this, see 13. 
In line [17] of Algorithm 2, 

?/"^* = proj^^.(^.)(2/) - hjx'^^iy - g^g^iy - hjx'^)) - h^x] 

= {I-Vm,,){v-h,x';)- (12) 

This means that after Xj is fixed to be combined 
with y (the same as CH does) and then the vector is projected 
to the orthogonal complement of the space spanned by g^. 
We can show that this guarantees that the updated y is in 
the subspace spanned by the columns of H which have not 
been chosen. This is consistent with the assumption that H is 
nonsingular, which implies that the original y is in the space 
spanned by the columns of iJ. However, it is not necessary to 
apply the orthogonal projector / — proj^^ to y — /ija;| in ( fT2] i. 
The reason is as follows. In Algorithm 2, t/"^* and gf^^ will 
be used only for computing (see line |7]l. But 

from dn) and ^ 

(ynew)T^new ^ _ ^^^s_^T „ p^^j^^ _ p^^j^^ 

= {y-h,x^^''gT^. 

Therefore, line [17] can be replaced by y := j/ — /ija;|. This 
not only simplifies the computation but also is much easier to 
interpret — after Xj is fixed to be hjX^^ is combined into y 
as what the CH algorithm does. Let H- i-n-i denote H after 
its J*'* column is removed. We then continue to work on the 
subproblem 

min IIj/ - iJ:,i:„_ii||2, (13) 

where 1 satisfies the corre- 

sponding box constraint. Here H-,i-n^i is not square. But there 
is no problem to handle it, see the next paragraph. 

In 0, 7? is assumed to be square and non-singular. In our 
opinion, this condition may cause confusion, since for each 
k except fc = ri in Algorithm 2, the remaining columns of 
H which have not been chosen do not form a square matrix. 
Also the condition restricts the application of the algorithm to 
a general full column rank matrix H, unless we transform H 
to a nonsingular matrix R by the QR factorization. To extend 
the algorithm to a general full column rank matrix H, we need 
only replace line[3]by G :— (ff^)^. This extension has another 
benefit. We mentioned before that the updating of G in line 
[19] is actually the updating of the Moore-Pernrose generalized 
inverse of the matrix formed by the columns of H which have 
not been chosen. So the extension makes all steps consistent. 

To reliably compute G for a general full column rank H, 
we can compute the QR factorization H — QiR hy the 
Householder transformations and then solve the triangular 
system RG^ = to obtain G. This requires (5m — 4n/3)n^ 
flops. Another less reliable but more efficient way to do this is 
to compute G = H{H^ H)^^ . To do this efficiently we would 
compute the Cholesky factorization H^H = R^ R and solve 
R^ RG^ — for G by using the triangular structure of R. 
The total cost for computing G by this method can be shown 
to be 3mn^ + If H is square and nonsingular, we would 



use the LU factorization with partial pivoting to compute H~ 
and the cost is 2n^ flops. 

For the rest part of the algorithm if we use the simplification 
and efficient implementations mentioned above, we can show 
that it needs Amn^ flops. 

We see the modified SW algorithm is much more effi- 
cient than both the CH algorithm and the SW algorithm 
implemented in a naive way we mentioned in the previous 
subsection. 

IV. Equivalence of CH and SW 

In this section we prove that CH and the modified SW 
produce the same set of permutations for a general full column 
rank H. To prove this it will suffice to prove that xf — xf, 
if = x'i, distj — dist^ for i — l,...,n in the first step 
which determines the last column of the final reordered H 
and that the subproblems produced for the second step of each 
algorithm are equivalent. 

Proving xf = xf is not difficult. The only effect the 
interchange of columns i and n of i? in CH has on the real LS 
solution is that elements i and n of the solution are swapped. 
Therefore xf is just the z*'' element of the real LS solution 
rounded to the nearest integer in Bi. Thus, with (O and (O, 

x'r = [{H^yr,^B, = [eIH^y^B, - Lfff yle. = (14) 

Therefore we also have xf = if . 

In CH, after applying a permutation P to swap columns i 
and n of R, we apply V'^, a product of the Givens rotations, to 
bring R back to a new upper triangular matrix, denoted by R, 
and also apply V to y, leading to y = V'^y. Thus R = V^RP 
and y = V^y = V^Qly. Then H = QiR = QiVRP^, 
~ PR-W^Ql, g, = {H^fe, = QiVR-^P^e, = 
QiVR^'^Cn, and Wg^y = ||-R"^e„||2 = l/|r„„|. Therefore, 
with ([Toll and (O 

distf = '^"^f 7^'' = \fnn\\y'^QlVR-^en " xf | (15) 
~ \^nn\\yn /'f^rui ^i] — |^nn(Cn ) I — dist,^'. 

Now we consider the subproblem ((8) in CH and the sub- 
problem (ITST l in SW. We can easily show that i?i:„_i i:„_i 
in ([8]) is the i?-factor of the QR factorization of H- i-n^iP, 
where H.^i-n^i is the matrix given in (T3[ and P is a 
permutation matrix such that x = Px, and that yi-.n-i in © 
is the multiplication of the transpose of the Qi -factor of the 
QR factorization of H-.i-n-iP and y in il3[ . Thus the two 
subproblems are equivalent. 

V. New Algorithm 

Now that we know the two algorithms are equivalent, we 
can take the best parts from both and combine them to form 
a new algorithm. The main cost in CH is to interchange the 
columns of R and return it to upper-triangular form using 
Givens rotations. When we determine the fc*'* column, we must 
do this k times. We can avoid all but one of these column 
interchanges by computing xf , xf and distf directly. 



After the QR factorization of H, we solve the reduced ILS 
problem (|2]i. We need only consider how to determine the last 
column of the final R. Other columns can be determined sim- 
ilarly. Here we use the ideas from SW. Let G ~ R^^ , which 
is lower triangular. By (IT4l . we compute for i = 1, . . . , n 



Algorithm 3 New algorithm 



1: Compute the QR factorization of H by Householder 



— \y ~ \yi:n^i-n,i \ jc^. 7 -^i — \yi:nGi:n,i\ 



dist, 



\Gi-r, 



Let j — arg max^ dist^. We take a slightly different approach 
to permuting the columns than was used in CH. Once j is 
determined, we set yi:„_i := j/im-i — ri-n-i. jXj. Then we 
simply remove the j*'' column from R, and restore it to upper 
triangular using Givens rotations. We then apply the same 
Givens rotations to the new y. In addition, we must also update 
the inverse matrix G. This is very easy, we can just remove 
the j*'' column of G and apply the same Givens rotations 
that were used to restore the upper triangular structure of R. 
To see this is true notice that removing column j of R is 
mathematically equivalent to rotating j to the last column and 
shifting columns j, + 1, . . . , n to the left one position, since 
we will only consider columns l,2,...,n — lin subsequent 
steps. Suppose P is the permutation matrix which will permute 
the columns as described, and V'^ is the product of Givens 
rotations to restore R to upper-triangular Let R = V^RP 
and set G = R^^. Then 

G = {V^RPy'^ = V^R-^P = V'^GP. 

This indicates that the same V and P, which are used to 
transform R to R, also transform G to G. Since G is lower tri- 
angular, it is easy to verify that G'i:„_i4:„_i — RiZ-i.i-.n-i- 
Both i?i:n-i,i:n-i and Gi:„_i_i:„_i will bc used in the next 
step. 

After this, as in the CH algorithm, we continue to work on 
the subproblem of size n—1. The advantages of using the ideas 
from CH are that we always have a lower triangular G whose 
dimension is reduced by one at each step and the updating of 
G is numerically stable as we use orthogonal transformations. 
We give the pseudocode of the new algorithm in Algorithm |3] 

Here we consider the complexity analysis of our new al- 
gorithm. If we sum the costs in algorithm |3] over all loop 
iterations, we get a total of ^j- + 2mn^ flops in the worst case. 
The worst case is very unlikely to occur, it arises when j = I 
each iteration of the outer loop. In the average case however, 
j is around fc/2 and we get an average case complexity of 
^ — h 2mn^ flops. In both cases, the complexity is less than 
the complexity of the modified SW algorithm. 

VI. Summary 

We showed that two algorithms for the column reordering 
of the box-constrained ILS problem are equivalent. To do that, 
we modified one algorithm and gave new insight. We proposed 
a new algorithm by combining the best ideas from both of the 
originals. Our new algorithm is more efficient than either of the 
originals and is easy to implement and understand. Since the 
three algorithms are theoretically equivalent and were derived 



Qi 



H 



(2(m - n/3)n'^ flops) 
C-T flops) 



(2(fc - i) flops) 
(2(fc - i) flops) 



22: 
23: 



transformations: 
and compute y : 

G := R-^ 
p := I : n 
p' := I : n 
for /j = n to 2 do 

maxDist —1 

for z = 1 to fc do 

■■= [ale. 

disti = la - a;i|/||Gi:fc,i||2 
if disti > maxDist then 
maxDist := disti 
j := i 
end if 
end for 
Pk := p'j 

Interchange the intervals Bk and Bj 
Interchange entries k and j in p' 

Set y := yi-.k-i - Rv.k-ijXj 

Remove column j of R and G, and return R and 
G to upper and lower triangular by Givens rotations, 
respectively, and then remove the last row of R and G. 
The same Givens rotations are applied to y. 

(6k{k - j) flops) 

end for 

Pi = P'l 



through different motivations, we now have both geometrical 
and algebraic motivations for the algorithms. 
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